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ABSTRACT 

We consider the shape of the Hkehhood and posterior surfaces to be used when fitting 
cosmological models to CMB temperature and polarisation power spectra measured from 
experiments. In the limit of an all-sky survey with Gaussian distributed pixel noise we show 
that the true combined likelihood of the four CMB power spectra (TT, TE, EE & BB) has a 
Wishart distribution and we discuss the properties of this function. We compare various fits 
to the posterior surface of the Cjs, both in the case of a single auto-power spectrum and for a 
combination of temperature and polarisation data. In the latter case, it is important that the fits 
can accurately match the Wishart distribution in the limit of near full-sky coverage. Simple 
extensions of auto-power spectrum fits to include polarisation data generally fail to match 
correlations between the different power spectra in this limit. Directly fitting pixel values on 
large scales, as undertaken by the WMAP team in their analysis of the 3 year data, avoids 
the complications of characterising the shape of the posterior for the power spectra. Finally 
we demonstrate the importance of the likelihood distribution on analytic marginalisation, and 
provide a formula for analytic marginalisation over a calibration error given an all-sky survey. 

Key words: methods: statistical - methods: analytical - cosmology: theory - cosmic mi- 
crowave background 



1 INTRODUCTION 

In this era of precision cosmology iSoergel et alj|2003l 120061) . it is vital that care is taken with every step involved in the analysis and 
interpretation of cosmological data. In this paper we consider the likelihood tec hnique used to fit cosmological models to CMB power 
spectra. For their analysis of the 3-year data fHinshaw et al.i 200a : IPage et al '200o). the WMAP team have adopted a pixel-based likelihood 
analysis at low multipoles, thus avoiding the complications introduced by fitting to the shape of the posterior surface i Slosar et al. 2004 ). 
This posterior surface is strongly non-Gaussian which must be accounted for when performing model comparisons using power spectra. In 
addition, there are now important constraints on both the temperature and polarisation power spectra, which are not independent and need to 
be jointly analysed. 

In this paper we review previous work analysing the posterior surface for temperature power spectra and extend this to include polar- 
isation data. Initially, we present exact formulae for all-sky surveys with negligible noise. The inclusion of isotropic uncorrelated Gaussian 
distributed pixel noise will not change the form of the posterior surface for the Cjs as it will simply increase the variance in the ajmS - to 
calculate the Cjs we are still summing the squares of Gaussian random variables. However, an incomplete sky map will change the posterior 
distribution, causing correlations between modes, and will also change the posterior shape. For any survey, the central limit theorem can be 
invoked to show that at large /, the likelihood distribution will tend to a multi-variate Gaussian form. We therefore see that the true likelihood 
win interpolate between a skewed distribution on large scales, and a multi-variate Gaussian distribution on small scales. Fitting formulae 
which are able to match this intrinsic change in shape have previously been adopted to provide an approximate likelihood calculation for a 
single auto-power spectrum. The primary aim of our paper is to extend these fits to the combination of temperature and polarisation data. 

The layout of our paper is as follows. In Section|2|we lay the groundwork by briefly reviewing the standard Bayesian approach to model 
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selection, using Bayes theorem to link the likelihood to the posterior of interest. In Section IsTI we review the well known likelihood and 
posterior distributions for the TT power spectrum of an all-sky noiseless survey. This work is expanded in Section lT2l to include polarisation 
data in the same limit of no noise and an all-sky survey - the combined likelihood is given by the Wishart distribution. The important 
properties, including the multi-variate Gaussian limit at large / are presented. In Section |4] we discuss the complications introduced by 
realistic surveys on the posterior shape. These complication s have led previous studies to consider posterior fits for the TT power spectrum, 
the most common of which is the log-normal distribution c Bond. Jaffe & Knox 2000). A number of these fits to the posterior surface are 
compared in Section lsTTI The extension of these fits to the joint analysis of temperature and polarisation data is considered in Section ls!2l 
The importance of this work setting the mathematical foundation for the shape of the posterior surface is demonstrated in Section|6| where we 
consider marginalising over a Gaussian distributed calibration error in the TT power spectrum. The difference between assuming a Gaussian 
posterior, commonly used in the literature, and using the correct distribution for an all-sky survey is demonstrated. We discuss our results in 
Section^ 

2 LIKELIHOODS AND POSTERIORS 

Although it is probably the most reprinted equation in scientific literature, it is central to the work presented here, so for completeness we 
include the standard Bayesian equation 

/(X|X) = /mm 

which relates the likelihood function /(XjX) ~ the distribution of the data X given a model X, to the posterior /(XjX) ~ the distribution 
of models given the data. The prior /(X), which cannot be avoided, provides the information that we already know about the models. For 
example, our prior might be that only 6-parameters are needed to model present CMB data, and that we initially know nothing about those 
parameters - they themselves have uniform priors. It is probably worth emphasising that the distributions of the likelihood and posterior can 
have different forms. For example, if the likelihood is Gaussian, but has a variance that depends on the model, then the posterior will not be 
Gaussian, for a uniform prior on the models. We must therefore take care to distinguish likelihood and posterior. 

In this work, we will quantify the data X using the power spectra measured from some experiment, and the models X by the same 
statistic. For an all-sky survey, where different modes are independent, the likelihood is 

/(X|X)=[]/(Ci^^|Cf^), (2) 

I 

where Cj"^ represents the measured power spectra, Cj"^ the model power spectra, and XX — TT, EE, TE, BB. The posterior that we are 
interested in, /(X|X) is dependent on the product of the likelihoods of individual multipoles. For a single multipole, we see that it is the 
dependence of the likelihood on the model power spectrum {Cj"^) that is of interest. In this paper we refer to f{Cf'^\Cf'^) as the posterior 
for the power spectra, as it is related to f {Cj"^\Cf'^) using Bayes theorem and a uniform prior on f{Cf"'^). For a given experiment, it is the 
posterior f (Cf'^\Cf'^) that tells us the model constraints provided by the data on a particular scale. 

It is worth emphasising that a cosmological likelihood analysis based on the work presented in our paper does not depend on the 
assumption of a uniform prior in the power spectra. Instead, in such an analysis, the prior would be defined by the set of allowed models. 
In this paper we do not consider a specific set of cosmological models, and simply focus on the dependence of the likelihood on our chosen 
model statistic - the model power spectrum Cf^ . We have decided to call this the posterior, simply because it can be considered in this way 
following the assumption of a uniform prior in f{Cf''^). Had we quantified the models by a different statistic, h{Cf'^), then the prior on the 
allowed set of cosmological models /(X) is unchanged - if we considered a grid of cosmological models, then this grid is unchanged. The 
likelihood for each I would have to be multiplied by dh{Cf^) / dCj"^ to allow for the change of variables, but we see that this is independent 
of the model values. Consequently, as we would hope, the same posterior distribution for the set of cosmological models would be recovered 
whatever statistic is used to quantify the models, provided there is no loss of information associated with this choice. 

3 LIKELIHOOD AND POSTERIOR DISTRIBUTIONS FOR ALL-SKY NOISELESS SURVEYS 
3.1 Temperature only data 

In an all-sky, noiseless CMB survey, the Spherical Harmonic coefficients a^ of the temperature fluctuations obey a Gaussian distribution 



.f(aL|CD = ^===exp 



2C,TT 



(3) 



aj^\^). Using statistical isotropy, we can average over m and define an estimator of the power 

m 

The sum of the squares of v standard Gaussian random variables has a x distribution with v degrees of freedom. Consequently, 
Yi = ^ \aim/ yCj'^\ will have a x distribution with v = 21 -\-l degrees of freedom 
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f(Yi\Cj 
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(5) 



Cf^ is a multiple of Yi, Cj'^ — Cj'^Yi/v so the likelihood will have a F distribution 



f{cr\c7^) « cj 



c 



qTT 



exp 



'^cr 



2CTT 



(6) 



The mean and variance of this distribution are Cf^ and 2(Cj'^^'^ jv. Note that the maximum of this distribution occurs at Cf^ — (y — 
2)/vCj'^ , not at the mean value. In the limit v -^ oo, the central limit theorem gives that this distribution tends towards a Gaussian with the 
same mean and variance. 

Using Bayes theorem to convert to the posterior f{C'^'^\C'j^'^), assuming a uniform prior in f{Cj'^) gives 



.f{c7^\c7^)o,{cr) 



■TTX-I//2 



exp 



2CTT 



(7) 



where we have not included the normalisation factor dependent on Cj'^ . The maximum of this function occurs at Cj'^ , while the mean is 
given by 



{CD = 



u~~8 



C/ 



(8) 



This offset between maximum and mean is simply a result of the skewness of the distribution. As i^ ^ oo, this distribution will tend towards 
a Gaussian form. This follows from the Bayesian identity and the fact that /(Cj^^jC;^^) tends to a Gaussian distribution (see the discussion 
following equation OIH . 

We can take the logarithm of the posterior to give 



- ln/(Cr|Cr) = ^ (inCr + C^/CD , 
ignoring an irrelevant additive constant. From this, we can calculate the curvature around the distribution maximum 



(9) 



d2in/(cncr) 



d^C^ 



(CTT)2 



(10) 



There is an alternative way of deriving equation ^ by considering the joint probability density of a[^ for —l^m^l. 



fiO-l,: 



T 



C7^)^Ylf{aJm\Cl 



(11) 



Substituting for f{a1^\C'7'^) 



from equation 0, this reduces to 



ln/(Cncr) 



|E0-^'"" + 



a-i 



T ,2/^TTX 



(12) 



where once again, we have ignored an irrelevant additive component. We see that this distribution is only dependent on a^^ through Cj''^ , 
and that substituting in C^'^ from equation ^4} leads to the same Cj^^ dependence that we had in equation J9j. 

3.2 Including polarisation data 

If we now include E-mode and B-mode polarisation data, there are 3 spherical harmonic coefficients of interest, aj^, af^ and af^. These 
are multivariate Gaussian random variables with expected values of zero. The data vector Xa and covariance matrix V; for the multivariate 
Gaussian can be written as 



Xa 



Vi 



c7 
c7 



B 







cr 
cr 





(13) 



Note that the cross-correlations between different parity fields are expected to be zero (and B has the opposite parity to E and T). The 
random variables of interest are the elements of the matrix 



s-^E^»x^- 



c? 
c7 



cl 
cf 



ci 
cf 



(14) 



where X^ represents the Hermitian conjugate of Xa. The matrix S; has a Wishart distribution with ly = {21 + 1) degrees of freedom. There 
is a slight complication caused by defining S; as the average over the (2/ + 1) modes rather than the sum, which is used to derive the standard 
Wishart distribution. Si still has a Wishart distribution, but we need to consider the matrix W; = Wi/{21 + 1) rather than Vj. The Wishart 
distribution is given by 
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4 W.J. Percival & M.L. Brown 

\Si\^---P-''y^ exp [-trace{W-''Si/2)] 

■^(^'■^'^ ^ 2W?rw7p5rvW2) ' ^^^^ 

where S; and W; are positive definite symmetric p x p matrices, v > p, and Tp(v/2) is tiie multi-variate Gamma function, 

V 

rp(i./2) = 7r^(^-i)/4 J]^ r [(^ + 1 - j)/2] . (16) 

Because of the form of Vj, we can decompose into two Wishart distributions, one with p = 2 for Cf^ , Cf^, C™ and a separate, 
independent, p — 1 Wishart distribution for Cf^, which reduces to a F distribution as described in the previous Section. To help to understand 
the form of the Wishart distribution, we now explain how it is normalised, focusing on the p — 2 distribution covering Cj'^ , Cf^ and Cj^ . 
The probability density function for S; is equivalent to the joint distribution of the elements of the matrix, over all positive definite matrices 



dCr dCr dQ™/(SdVO = l. (17) 



As expected, the marginal distributions of the diagonal elements of S; have a F distribution as described in Section l3!T1 However, the 
same is not true for the off-diagonal elements. First, suppose that we have obt ained Cj'^ and Cj^, but for some reason not Cf^ (as for 
example with the WMAP year 1 data lSennett et aliboOStlHinshaw et all2003l) . The joint likelihood of C'^'^ and C]^^ can be obtained by 
integrating equation <15> . with p — 2, over Cf^ 
< Cj'^ < oo, and Cj^ with the constraint — cxj < Cl 



=.TE\2^TT 



^ ^qTt^ee _ 2C7''cr + ^^' ^ ^' 



2\V\ \ (^TT 



.(18) 



r/^TT ATEi^TT ^TE ^EEx _ V K^l ) 

0FF(i./2)2^1\/li/2 (C,TT){-' l)/2 

It is interesting to note that a constraint on C^"^ and (7™ still leaves us with a likelihood that is dependent on Cf ^ - information is retained 
in this case. In fact, were we presented with an all-sky survey with negligible noise, then the likelihood analysis of any two or more of the 
three possible E and T mode auto- and cross-power spectra should be attempted using the full matrix V 
By integrating equation <18> over Cl 



f{c7^\cr,cT^,cF^) 



^r{u/2)2^\V\'/^ 



/^TE\2 n (■'-l)/-! /../=<TE^TE\ /,,|/^TE| /,-tTTr'EE 



TT/^EE 



crc, 



exp ( ^-^^^^ I K(._„/. I -l^'^lx^^ j , (19) 



where K,i is a modified Bessel function of the second kind, and \V\ = Cj^^C™ — (C™)^. This marginal distribution covers the interval 
— oo < C™ < oo. Again it is worth emphasising that the likelihood is dependent on C^'^ and Cp^ in addition to C™. In a Bayesian 
analysis, the model constraint provided by a measurement of C™ depends on all of these model values. 

The marginal distrib utions of Cj'^ , Cf^, Cj^ and Cf^ are plotted in Fig.Qfor the best-fit cosmological model of the WMAP year-1 
data iSoergel et alJl2003» where we have included a B-mode polarisation component with an input tensor-to-scalar ratio of T/S — .05. 
These distributions we re calculated from a ll-sky realisations of cosmological power spectra calculated using the CMBFAST I Seliak & Zal darriagal 
Il996h and HEALPIX JGorski et all2005h packages. For comparison we plot the distributions predicted by equation J6j for the auto power 
spectra and equation <19> for the TE cross power spectrum, which show excellent agreement with the simulated data. 

For the Wishart distribution, if we define the data vectors of interest by 



Xc = CT^ , Xo = Cr , (20) 





then the covariance matrix for Xc is given by 

2{crr 2crcr ^{crr 

2crcr crcr + {cr? 2c™cf ^ i . (21) 

2[Crf 2Q™Cf^ 2(C7f^)2 

As expected, the variance for the auto-power spectra matches that of the F distribution discussed in equation ^. However, the variance of 
the distribution in Q has a different form, reflecting the change in marginalised distribution (equation <19t rather than a F distribution). 

In the limit u -^ oo, the Wishart distribution tends towards a multi-variate Gaussian form, with the same covariance matrix. It is worth 
noting that the matrix Y is also the curvature matrix around the distribution maximum, which will become important in Sections fS . 1 l and l?!2l 



4 COMPLICATIONS FOR MORE REALISTIC SURVEYS 

As discussed in the previous Section, the posterior distribution for a noise-less all-sky survey does not have a multi-variate Gaussian form 
in the auto- and cross- power spectra at low v. The situation is more complicated for realistic data that includes effects such as noise, beam 
uncertainties, calibration errors and limited sky coverage. Such effects are often dealt with by modifying the posterior distribution to match 
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Figure 1. The distribution of recovered Ci—r calculated for the best-fit cosmological model to the 1-year WMAP data (histograms). A B-mode polarisation 
component was included in the simulations using a tensor-to-scalar ratio ofT/S = 0.05. The recovered means, which coincide with the input model spectra, 
are shown as the vertical dashed lines. These distributions are shown to be in excellent agreement with the predictions provided in Section lT2l (smooth curves). 
Note that the marginalised distribution for the TE cross power spectrum more closely resembles a Gaussian distribution than do the F distributions of the auto 
power spectra. 



that found from detailed simulations of the particular experiment being considered (e.g. lHivon et all2002h . This modified distribution can, i n 
turn, be fitted by simple functional forms, thus allowing rapid calculation for any given cosmological model (e.g. Bond, Jaffe & Knox 2000). 
In this Section we briefly examine the effect of real- world complications on the posterior, which provide the motivation for our consideration 
of possible fits for the posterior distribution in Section|5| 



4.1 Partial sky coverage 

First we simplify the analysis by ignoring noise and just considering the effect of partial sky coverage. Here, the spherical harmonic coeffi- 
cients of the cut sky aim are related to the true spherical harmonic coefficients by 



O-lm — / ^ KimVm'0-l'm' , 



(22) 



where Ki„,iim' i s a kernel which describes coupling between modes introduced by the non-uniform sky weighting JHivon et alj|2002l : 
iKogut et alj|2003h . It is informative to consider why the Wishart distribution is not valid for these modes. To see this, we focus on the TT 
power spectrum. For the cut sky coefficients for a particular mode, the estimator, C^"^ , equation J4}, now sums over a linear combination 
of the aim- We can work around correlations between the dim by defining uncorrelated combinations (which must be independent because 
they also form a multi-variate Gaussian distribution). However, the distribution will consist of variables with differing variance, and therefore 
does not lead to a Wishart distribution. Additionally, modes at different I will be correlated, again causing deviations from the analysis in 
SectionlT2l 

The introduction of any sky-cut to a CMB dataset renders the coupling kernel, K of equation <22> . singular, and thus prohibits the use 
of equations J4j & <22> to estimate the underlying power spectra. Various methods ha ve therefore been d eveloped for estimating CMB power 
spectra from cut-sky datasets, the most prominen t of which are the pseudo-C; (PCL. lHivon et all2002l) and quadratic maximum likelihood 
(OML. lTegmarkll997LlBond. Jaffe & Knoxll998l) methods. 

There are two regimes that we can easily analyse - given a severe sky cut, both the PCL and QML estimators will modify the likelihood 
distribution from the full-sky Wishart distribution of Section IT2I However, on small scales, we can still apply the central limit theorem to 
show that the likelihood of the polarisation and temperature power spectra has a multi-variate Gaussian distribution for both estimators. The 
covariance matrix of this Gaussian distribution will change as a severe sky-cut gives rise to the loss of information - some linear combinations 
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of the full-sky aim will define modes that lie within the cut sky and these modes cannot be recovered from the observed aim- The net result 
on the likelihood distribution is a reduction in the number of degrees of freedom of the system. Consider, for example, the PCL estimator for 
the temperature power spectrum defined by 

Ci^^M-,'di,, (23) 

I' 

where Ci = jixrl'^iml^ is the pseudo-power spectrum measured directly on the cut sky, and Mni is the Ci coupling matrix defined by, 

I" ^ ' 

Here, W; is the power spectrum of the weight function, VK(fi) applied to the data (e.g in the simplest case, VK = 1 for observed pixels; 
VK = otherwise) and the term in parentheses is the Wigner 3-j symbol. On small sca les, the recovered power spectrum Ci will still have a 
r distribution but with the number of degrees of freedom reduced from i/ = 2i + 1 to JHivon et all2002h . 

i.^« = (2; + l)/,kyj^, (25) 

where Wi is the i"* moment of the weighting scheme employed and /sky is the fraction of the sky having non-zero weighting. If we ignore 
the effect of the weighting scheme, we see that the covariances of the power spectra will be the standard all-sky covariances divided by /sky 
- on small scales the number of pairs of data points simply scales as the fraction of sky covered. 

The other regime that can be easily analysed is a modest sky cut on large scales. In this regime, the QML estimator approximates the 
exact estimator of equation J4} with estimates of the full-sky a;™ 's given by 

aim = 2Z1 -^iml'm'"''™'' ^"^^^ 

I'm' 

where K is a non-singular matrix formed by truncating the coupling kernel, K of equation <22t at finite values of I' and m' JEfstathiod 
l2004al) . This truncation of the coupling kernel is possible since, at low multipoles, the aim will only be weakly correlated with any of the 
aim which lie within the cut sky. lEfctathioul i20 04a) has demonstrated, using simulations, that the QML estimator recovers the true Ci, at 
low /, almost exactly in the presence of a modest sky cut (/sky = 0.83). For these QML Ci estimates, data at different / are independent and 
the likelihood distribution remains a Wishart distribution as described in Section lT2l In addition, the (co)variances of the QML estimates 
remain those of equation <21> . 

Clearly, we are going to be interested in both of these regimes, and in analysing data in between. Note that, in this latter regime, the PCL 
estimator is known to be sub-optimal and the variances of PCL estimates would be significantly increased by estimator-induced variances. 

The estimation of Ci is obviously coupled with the posterior surface that should be assumed for any model. For instance a "bias" when 
a Gaussian posterior is assumed is removed when assuming the correct shape of surface. The issue of whether different analysis methods 
introduce further Gaussian or non-Gaussian noise is more important for our present work. 



4.2 Including noise and beam smearing 

Here, we consider an all-sky CMB survey, with additive noise, Ni and symmetric Gaussian beam smoothing, Bi. A single (independent) 
auto-power spectrum measurement can now be written as Di — Ci + Ni/Bf, where we have dropped the explicit dependence on TT, EE 
or BB - when we do this, the formulae are valid for any of these three auto-power spectra. Note that Di represents measured spectra which 
have been corrected for the effect of the beam, Bi but have not been corrected for the noise bias, A^; . 

At this point we need to make a distinction between isotropic and anisotropic noise. For isotropic and uncorrelated Gaussian distributed 
pixel noise, the aim will be independent for different I, m. However, this is not the case for anisotropic noise, where the noise level changes 
from pixel-to-pixel. In this latter situation, we have, in effect, a noise window function that will induce correlations between the aim- In the 
former case of isotropic pixel noise, A'^i = const and our data vector (of which we are calculating the power) is still expected to have a 
Gaussian distribution. In this case, the marginalised distribution of Di still has a V distribution, and the posterior is altered from equation J9), 
becoming ISond. .Taffe & Knoxl200(il) 



■21n/(AiA) = (2; + l) 



The curvature around the posterior maximum is 



(27) 



dCidCv 



oc 



{Ci+Ni/Bty'Siv, (28) 



and the error on our model power spectrum Ci is proportional to (C; + Ni/Bi ). 

In the case of a joint analysis of temperature and polarisation data, for isotropic noise, the matrix Sj, equation J14t . will still obey the 
Wishart distribution of equation <15> . but with a revised Wi matrix given by Wj = V[/(2Z + 1) where Vj is now given by 
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V;=( Cr Cr + Nf/^Bjf . (29) 

Cr+Nf/{Bjf I 

In equation i29), we have now allowed for different beam widths (Bj & Bf) and noise levels {N"^ & Nf^ = Nf^ = Nf) for the 
temperature and polarisation data. Note that for uncorrelated Gaussian distributed noise, the noise cross power spectrum, A^,^^ = 0. The 
inclusion of noise and beam smoothing will modify the covariance matrix of the power spectra measurements, equation <21> . which becomes 
lEisensteinetalll999l) 

Y' = i 2c™(cr + N7/{Bjf) {crf + {cr + N^/iBrmcr + Nf/iBrr) 2cricr + Nf/iBrr) i . oo) 



u 



2{crr 2cricr+Nr/iBrr) 2icr+Nr/{Brrr 

These expressions, which are for the case of uniform uncorrelated pixel noise, are most relevant for satellite experiments where the 
pixel noise covariance matrix is near-diagonal. However, the planned scanning strategy of the Planck satellite mission will result in a much 
larger integration time near the ecliptic poles, leading to an anisotropic noise map. This experiment, and ground-based experiments with more 
complicated noise properties, will require simulations to accurately quantify the posterior distribution for the system. Such considerations 
motivate the consideration of fits to the posterior distribution considered in the next section. 

5 FITTING TO THE POSTERIOR 

The complications discussed in the previous section for realistic data, and the deviations from Gaussian behaviour of the posterior discussed 
in Section|3| can be modelled using a fit to the posterior surface. Such fitting functions can also allow the posterior surface to be approximated 
without requiring the inversion of a covariance matrix for every model tested - the effect of a varying covariance matrix can be absorbed into 
the shape of the function. In Section lsHl we consider a number of possible forms for the posterior fit to a single auto-power spectrum. The 
extension to include polarisation data is considered in Section l5!2l A good choice of fitting function should be able to interpolate between 
the posterior distribution in the limit of an all-sky survey at \ow-l and in the limit of a multi-variate Gaussian distribution at high-Z. In 
order to compare the suitability of different fitting functions, we therefore choose to consider their ability to match the true distributions in 
these situations. As the inclusion of isotropic and uncorrelated Gaussian distributed pixel noise does not change the shape of the posterior 
distributions in either limit, we can ignore its contribution without loss of generality - the effect of isotropic noise can be trivially included 
in both the fits and in the limiting situations that we are testing against. 

5.1 Single mode auto-power spectra 

We now consider a number of possible fitting functions for the posterior distribution for a single mode of an auto-power spectrum. As in the 
previous section we will drop the explicit reference to a particular auto-power spectrum, as the formulae and concepts are valid for TT, EE 
or BB power spectra. We quan tify the shape s of different fits using an expansion of the posterior around the maximum Ci = (1 + e)^; (a 
variation of the method used in l Verde et alj20 03). 

For an all-sky, no-noise survey, equation J9j can be expanded to give 



- 2 In f{Ci\Ci)=u 



^2 0,3 



(31) 



ignoring an irrelevant additive constant. Note that this is not equal to equation 9 in I Verde et alj J2003l) because of the different expansion 
adopted (we expand around Ci rather than Ci). At first glance, it appears that changing the value of ;/ = {21 + 1) does not change the shape 
of the surface around the maximum as it affects all order terms equally. However, as i/ increases, the posterior at a fixed value of e increases. 
We therefore see that, although the overall shape does not change, the range of parameters with "21n/(C;|C;) below a fixed value reduces 
in size and the term of order e'^ dominates the behaviour. It is in this way that the distribution tends to a Gaussian as i^ — > cx3. 

For an all-sky survey, the curvature around the posterior maximum is 2{Ci)'^ /u. Given a distribution f{Ci\Ci), then the term of order 
e^, where Ci — (1 + e)^;, should equal i/e^/2 in order to match this curvature. The distributions that we now consider as approximations to 
the posterior surface have all been normalised to match this behaviour to order e^ . Note that these distributions depend on the model power 
and therefore intrinsically allow for the model dependence of the variances of the measured power spectra. In this case, a fixed covariance 
matrix should b e used with these posterior fits. An alternative approach would b e to recalculate the variance for each model and use an altered 
posterior shape iEfstathioj2004bHchallinor & Chorl2005l : lBrown et all2005l) . 

(1) First, we consider a F distribution for Ci with degrees-of- freedom equal to /i, thereby matching the shape of the likelihood. We have 
to be slightly careful as the distribution is usually defined in terms of the mean rather than the maximum. In terms of the maximum, Ci, 
assuming a F distribution gives 



(m - 2)C, 



L 2Ci 

where we have an extra (/x — 2)//i term in the exponent compared with equation ^6}. 
This leads to an expansion in e 
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■21n/(C,|C0oc(At-2) 



\n{Ci/Ci) + 



Ci-Ci 



C, 



(m-2) 



+ 0(6^ 



(33) 



This distribution is included to empliasise the fact that the posterior does not have a the same form as the likelihood - this can easily be seen 
by comparing equations <31> and <33> . 

(2) Next, we consider a Gaussian distribution in Ci with fixed variance - chosen to match the curvature around the distribution maximum 






(34) 



The mean and maximum of this fit coincide at Ci . Most of the distributions that we now consider result from replacing Ci and Ci in the right 
hand side of this expression with functions g{Ci) and g(Ci). 
(3) For example, setting g(Ci) — [Cif /Ci gives 



21n/(C,|C,)oc 



2{CiY 



{Ci - Ci) 



"-^-e' + Oie'^) 



(35) 



This distribution could also have been derived from equation J34> . by including a factor {Ci/CiY to make the variance a function of the 
model. Note that this distribution is not a Gaussian distribution in Ci with covariances proportional to (C;)^ - this would require an extra 
term in the posterior from the effect on the determinant of the covariance matrix. Additionally, this distribution is not the same as a Gaussian 
in {Cif' /Ci, which would require the inclusion of a Jacobian from the change of variables. 

(4) In a recent paper -Smith et al- 12006.') have suggested a form for the posterior with g{Ci) = 3{Ci)'^'^{Ci)~^'^, which gives 



2(Ci)^ 



+ 0{e') 



(36) 



The (Ci)~^' '^ formula was derived in lSmith et alj yOOg) from the third derivative of the posterior of an all-sky survey, and can be seen to 
recover the correct behaviour to order e^, matching equation <3H . 

(5) The standard log-normal distribution can be derived by setting g(Ci) = Ci In Ci, to give 



2ln f{Ci\Ci) ex — ^ [Ciln{Ci/Ci)Y = v 



^2 ,3 



(6) In analogy with equations <34t & <35t . we can consider g(Ci) — [CiY ln(C';)/C; (non-standard log-normal). 



21n/(C,|C0oc 



2(g: 



{C,, 



c. 



■\n{Ci/Ci\ 



^-^ + o(.^) 



(37) 



(38) 



This is the distribution obtained by setting the variance to be a function of the model to be tested in the standard log-normal distribution, 
equation <37> . 

(7) It is also possible to consider the offset log-normal distribution, where g{Ci) = Ci{l + a) ln(C; + aCi) 



2[n f{Ci\Ci)(x 



2(G) = 



Ci{l + a)ln{ 



Ci + aCi , 
Ci + aCi ' 



^ e« + 0(.^) 



2(1 + a) 



(39) 



Setting a = —1/4 matches the all-sky no noise behaviour to order e^. As a ^ oo, this distribution tends towards Gaussian form. 

(8) Because all of our definitions oi g(Ci) require the same curvature matrix to match the e^ behaviour of the true distribution, we could 
also consider a combination of 2 or more of them. For example, setting g(Ci) = aCi -^ (\ — a)Ci liiC; gives 



21n/(G|C0 



2(C, 



ACi - Ci) + (1 - a)Ci \u(CilCi)\ ' = V 



L + ^,3^Q.4. 



(40) 



As with the offset log-normal distribution, we can set the free parameter a = —1/3 to match the behaviour of the true distribution to order 
e^. For a ^ 1, the distri bution obviously tends towards a Gaussian form. This distribution, which we suggest calling a summed log-normal 
distribution, was used bv lPercivall i2005ft to model the large-scale structure power spectrum from the 2dF galaxy redshift survey. 

(9) An alternative procedure, adopted bv l Verde et al. (2003^ . is to combine different posteriors after calculation. Verde et alj i2003h con- 
sidered combining the posterior Vi of equation <35t and the posterior V2 of equation <37t . Matching the all-sky no-noise posterior shape of 
equation <31> requires 



21n/(C,iC'0(xipi + |p2 = !/ 



^2 9^3 



(41) 



We have presented a variety of possible fits to the posterior surface in order to highlight that, even for a single auto-power spectrum, the 
optimal choice is by no means certain, and will depend on the experiment being analysed. 
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5.2 Single mode combined temperature-polarisation spectra 

We now extend our consideration of fitting formulae for tlie posterior presented in Section l^TTI to include combined temperature and polari- 
sation data. In this section we only consider a single-mode, and our vector of model power spectra is Xc, with observed value Xc, defined 
in equation <20t . 

For an all-sky no-noise survey, the marginalised likelihood distribution for the temperature-polarisation cross-power spectrum C*™ was 
given in equation <19t . and has a form that is different from that of the auto-power spectra. It is therefore clear that the posterior predicted for 
C^^ will have a different shape from that of equation 0. In fact, the maximum in the posterior of the marginalised distribution no longer 
occurs at C;^^ = C™, so we cannot expand the marginalised distribution around the maximum. However, the general Wishart distribution 
presented in Section l3!2l does predict a maximum in the posterior at Xc ~ Xc, so we can expand around this point. We can write the 
posterior distribution as 



/(XclXc) 



[crcf 



(C,TE)2]-^/2 



exp 



C 



crcf^ 



icl 



^TT(^EE 



{C} 



The curvature matrix around the maximum can be calculated from the second derivatives of this distribution giving 



Yr^ 



2\CrC} 



{crY? 



(C'l 



I 

E] 

F 

TE\2 



(Ci 



'rr 



2[c7^cr + {cl 

-2crcr 



(42) 



(43) 



{Cj 



which is the inverse of the covariance matrix given in equation J21> at Xc = Xc. 

To compare fitting functions with this distribution, we will adopt the philosophy used in Section lsTTI for independent auto-power spectra, 
and expand around the maximum. The equations are simplified if we define 



/-<TT/jEE 



(^TT/^EE 



(44) 



First we expand the Wishart distribution in the direction of Cj'^ by fixing Cj^ 



cr & cf 



Cf^, and expanding in Cj'^ 



(1 + e)C, . In this direction. 



-2\iif{cr\cr)^u 



3(f 2 - 1) 



e' + O(e^) 



(45) 



In the limit as f — *■ 0, this tends towards the distribution given in equation <3H as expected. By symmetry, expanding Cf^ around Cf 
would give the same series expansion (the auto power spectra predict the same posterior shape). 



Expanding the Wishait distribution in the direction of Cj^ by fixing Cj'^ = C'^'^ & Cf ^ = Cf^, and setting C^^ = (1 + e)^^ 



gives 



21n/(Q™|a™)oc/. 



6^ 



3(f4 



+lle' + 0{e') 



(46) 



For f^ = 0, the distribution has a perfect Gaussian form. 

We now consider fitting this distribution using multi-variate extensions of the functions discussed in Section lsHl for auto-power spectra. 



must differ from that of C; and Cj . The offset log-normal distribution has the flexibility to allow for this change in shape, and we will 
focus on the multi-variate extension of this distribution. Formally, we will consider a distribution in 



ln(Ci 



TT 



TE/=<TE\ 



cni+a 

C'f'=(l + a'='=)ln(Cf'= 



'cr 



given by 

-21n/(Xc|Xc)oc(Zc 



Zc)'Yr'(Zc-Zc) 



(47) 



(48) 



The series expansion of this multi-variate distribution along the standard axes was given in equation <39> . Matching this expansion to 
equations <45> & <46> . gives 



-J(l + 3r^), 



= -; 2 



3(f^ 



(49) 



Note that the logarithmic terms in equation <47t can be ill-defined for certain models. This is true already for the offset-lognormal 
distribution commonly used to approximate the Cf^ likelihood distribution - in equation J47t . a^^ is a free parameter and can, in principle, 
take negative values leading to an ill-defined likelihood for particular models. However, letting a ^ oo (for -l-ve measured Ct) ov a -^ — oo 
(for -ve measured Ci) will make the distributions in equation <47> tend to a Gaussian, and the fit is well-defined for all models in this limit. 
In general, we're only concerned with the slightly non-Gaussian regime where ill-defined terms do not arise except for very extreme models. 
One should therefore consider the posterior distribution of equation J47> as only being valid for a subset of models - those models for which 
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Figure 2. Posterior distributions in different directions in parameter space calculated for the Wishail; distiibution (solid black line), a simple Gaussian fit to 
this distribution (dotted Une), and a log-normal distribution matched to the Wishart distribution (dashed line). Plots are presented for I = 7 (top two rows) 
and / = 100 (bottom two rows). For each value of I, the upper row shows the distribution along the auto and cross-power spectra, while the lower row shows 
the distribution expanded along the eigenvectors of the covariance matrix. Cosmological parameters were fixed at the best-fit values of the 6-parameter model 
fitted to the CMB and 2dFGRS data in .Sanchez e t aL. 1 2006.1 . 

the fit is well defined. AH other models are assumed to be highly unlikely. Note that this does not prohibit the use of this function as a fitting 
function - in fact, as can be seen in Fig.|2| it matches the all-sky likelihood well in the directions of the cross- and auto-power spectra. 

The distributions calculated adopting these parameters and the curvature matrix of equation <43t are plotted in Fig. |2| for a basic 
cosmological mo del with parameters s et at the best-fit values of the simplest 6-parameter model that adequately fits the CMB and 2dFGRS 
data presented in lSanchez et alJ i200€i) . We present distributions at two wavenumbers I — 7, and I — 100. Fixing the distribution shape in 
the directions of the power spectra leaves no free parameters in this simple fit as the covariance matrix is fixed to match the curvature around 
the peak. In addition to matching the distributions along the power spectra elements of Xc, the distribution should match the posterior 
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distribution in arbitrary directions in parameter space. In Fig.|2|we therefore also plot the distribution along the principal components of the 
covariance matrix for these two values of L As can be seen, fitting the distribution along the power spectra does not constrain the distribution 
in other directions with the same accuracy. This is an inadequacy of our assumption of replacing the power spectra in equation <34t with 
a more general function, rather than a problem in the accuracy of the function chosen. We can match the function arbitrarily well in the 
directions of the power spectra, but the discrepancy in other directions remains. For example, in addition to using the offset log-normal 
distribution, we have also considered simply defining new variables as the sum of powers of {Cf'^ — Cf'^), which can be adjusted to match 
the distribution to arbitrary order around the maximum. However, even in this case, we find similar problems to those encountered using 
the offset log-normal distribution. Had we fixed these simple fits in the directions of the principal components, then we would have found 
problems fitting along the directions of the power spectra. One can, of course, envisage constructing more complicated fitting functions which 
more accurately reproduce correlations between different power spectra. Such functions would, however, lack the simplicity (and therefore 
some of the appeal) of fits commonly used to model the temperature power spectrum posterior. 



6 MARGINALISING OVER NUISANCE PARAMETERS 

For real CMB datasets, one often needs to account for an uncertainty in the calibration of the experiment. This uncertainty is usually 
considered a nuisance parameter and is marginalised over. For simple assumptions about the form of the underlying power spectrum posterior 
and of the calibration error, we can perform this marginalisation analytically. Here, we consider the effect of the posterior shape on this 
marginalisation process. For this analysis, we focus on a single auto-power spectrum. As in previous sections, we drop the explicit dependence 
on TT, EE or BB when the formulae and derivation are valid for any of these three auto-power spectra. 

Consider an experiment where the observed data has a multiplicative "calibration" error, h that is known to have a Gaussian distribution 
((&) = 0, {h^) — (7b). If we know the calibration error, then the "true" observed power spectrum value can be recovered, (C'i)truo = (l+b)^;. 
The posterior distribution of Ci is then given by 



f{Ci\Ci,at,)^ dbf{Ci,b\Ci,at)= dbf{Ci\Ci,b,at)f{b\at) 



If f{Ci\Ci, b, at) has a Gaussian form with variance S then. 



f{C,\C,,b,at) = 



: exp 



-{Ci - (1 + b)Ci)S-\Ci - (1 + b)Ci) 



This marginalisation can be reduced by "completing the square" JBridle et alj2002f) to gi' 

f{Ci\Ci,ab) = {l + C,S-^Cial)-^/^f{Ci\Ci,b',a,), 



where 



b' = 



CiS-^Ci - CiS-^Ci 



(50) 



(51) 



(52) 



(53) 



is the value of the calibration that maximises f{Ci,b\Ci,ai,). The offset term (1 + CiS^^Cia'^)^^''^ in equation J52> arises because the 
variance S is independent of the calibration error. 

The procedure for analytic marginalisation is, however, dependent on the posterior distribution. For an auto-power spectrum of an all-sky 
no-noise survey, the joint distribution of Ci and b is given by 

u2 



f{Ci,b\Ci,at)o,{Ci)-'^'' 



exp 



i^(l + b)Ci 
2Ci 



1 



2nab 



■ exp 



2af 



(54) 



As in Section|5| we have ignored a contribution from uncorrelated Gaussian pixel noise, which can be easily included as it does not affect 
the shape of the posterior distribution. Completing the square (as for the Gaussian case above) gives the marginalised posterior. 



f{Ci\Ci,at)<x{Ci)- 



vl-i 



exp 



v[\ + 2b')Ci 
2Ci 



where 
&' = - 



2Ci ' 



(55) 



(56) 



is the value of the calibration error that maximises the distribution f{Ci,b\Ci,ab). As can be seen, analytic marginalisation is also trivial 
when the exact posterior distribution for an all-sky survey is used, rather than a Gaussian. Because of the skewness of this distribution, the 
result is offset - the value of the calibration error needed to mimic the effect of a full marginalisation is twice that which maximises the 
likelihood. This demonstrates the difference between marginalisation and simply taking the likelihood maximum. 

Our decision to adopt a Gaussian distribution for the calibration error for a single auto-power spectrum measurement was reasonably 
arbitrary, and we could instead, have considered a calibration error that is Gaussian in the pixel values. When considering the joint likelihood 
of temperature and polarisation data, the nature of the calibration error becomes increasingly important. For example, assuming independent 
Gaussian distributed temperature (b^) and polarisation (6^) calibration errors would give a calibration error on the TE power spectrum 
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with a distribution based on a modified Bessel function - the distribution of [(1 + &^)(1 + b^)Y'^ . Obviously, this would complicate the 
procedure for analytic marginalisation. Additionally, from a single experiment, the temperature and polarisation calibration errors will be 
highly correlated, which will further increase the complications. Given these issues, there is clearly little to be gained from working through 
a derivation of analytic marginalisation for combined temperature and polarisation power spectra. For a given experiment, it seems clear that 
numerical simulations would be required to quantify the effect on the posterior. 



7 DISCUSSION 

The ultimate goal of CMB experiments is to constrain cosmological models by comparison with theory. In the currently favoured inflationary 
based cosmological model, the temperature and polarisation fluctuations in the CMB are expected to be isotropic and approximately Gaussian 
distributed JLiddle & Lyth 2000) . In this case, the statistical properties of the CMB can be described completely by the auto- and cross-power 
spectra of the temperature and polarisation fields. This data compression greatly speeds up the process of comparing with large numbers of 
theoretical models. In this paper, we have reviewed the mathematical foundation for this comparison. In a Bayesian analysis the posterior, 
which determines the model constraints, is directly related to the likelihood of a set of data given a particular model; it is therefore important 
to characterise the likelihood for a given experiment. 

For an all-sky survey with uncorrelated Gaussian distributed pixel noise, we have shown that the joint likelihood of the four CMB 
power spectra is given by a Wishart distribution, a distribution commonly encountered when calculating covariance matrices from Gaussian 
distributed data. This distribution, which can most easily be written in terms of matrices of the data and model power spectra, provides 
the likelihood of the measured power spectra including the constraint of positive definiteness. The shape of the likelihood is significantly 
different from a multi-variate Gaussian at low order multipoles, although it tends towards a multi-variate Gaussian form at high multipoles. 
The Wishart distribution can be integrated to give marginal distributions for the individual auto- and cross-power spectra. For the auto-power 
spectra, these marginalised distributions reduce to the well known V functions. For the TE cross-power spectrum, the marginalised distribution 
is more complicated, but can be calculated. We find that the resulting distribution for TE is significantly different to the F distributions of 
the auto power spectra and is closer to (although still differs from) a Gaussian. We have compared the marginalised distributions with 
those empirically determined from simulated data, finding excellent agreement. Realistically, CMB observations that include polarisation 
measurements will simultaneously provide constraints on all of the auto- and cross-power spectra. Consequently, the marginal distributions 
are probably only of academic interest, and we need to consider the combined distribution of all of the different power spectra. 

Given the complications of noise and limited sky coverage in real CMB data (discussed in Section|4}, the distribution of measured power 
spectra will, in general, deviate from a Wishart distribution to some degree. However, for a moderate sky cut, it is possible to recover the true 
temperature auto-power spectrum on large scales. In the case of polarisation data, large uncertainties in the level of polarised foregrounds are 
currently a limiting factor for CMB experiments. If our understanding of such foregrounds improves sufficiently, then it may be possible to 
recover the full-sky versions of all four CMB power spectra from future experiments. In that case, the Wishart distribution will be the correct 
distribution to use for comparing models and data on large scales. 

To account for the effect of limited sky coverage on the posterior shape, it has become common practice to model an empirically deter- 
mined posterior shape using simple fitting functions. We have considered a number of different fits to the posterior for a single auto-power 
spectrum in Section lsiTI and have extended these simple 1-dimensional fits to cover the combined analysis of temperature and polarisation 
data. The most commonly used fitting function for auto-power spectra is the offset log-normal distribution. In Section BjI it was shown 
that this is a member of a wider class of models that form a particular extension of the standard Gaussian posterior form. Consequently, the 
extension of the log-normal distribution to consider the combination of polarisation and temperature data is straightforward, following the 
natural extension of the Gaussian distribution to a multi-variate Gaussian. A good test of the validity of a fitting function is its ability to match 
the known posterior distribution for an all-sky survey. For the multi-variate analogue of the log-normal distribution there is a problem, not in 
its ability to fit to the posterior in the direction of a particular power spectrum, but in fitting correlations between the different power spectra. 
In fact, we have argued that this problem must be fundamental to the class of models which adopt the same form for the posterior (in the 
notation of Section l5.2l where some g{Ci) is adopted). 

An alternative approach to fitting to the power spectra is to directly fit the pixel data. This has the advantage that the pixel values have 
a multi-variate Gaussian distribution and the posterior shape is therefore well known and simple to characterise. However, since it requires 
the inversion of a TVpix x A'^pix pixel-pixel covariance matrix, this method can only be easily employed on CMB maps with relatively coarse 
pixelisation, and can therefore only be used to probe the largest scales. To probe smaller scales, more resolution is required and the method 
rapidly becomes computationally unfeasible with increasing number of pixels. The WMAP team in their 3-year data analysis adopted a 
hybrid approach where the pixel data were directly fitted on large scale s (Z ^ 12 for tem perature and Z ^ 23 for polarisation), and the TT 
and TE power spectra were fitted on smaller scales (Z > 12 and Z > 23: lHinshaw et all200a : IPage et all2006l) . On large scales, this has the 
attractive benefit of avoiding the issues arising from the complicated posterior shape for the power spectra - the real world complications 
discussed in Section|4]do not distort the likelihood of the pixel values from a multi-variate Gaussian distribution. 

In Section |6| we have applied our analysis of posterior shapes to consider marginalisation over nuisance parameters, focusing on an 
unknown calibration error. Given a form for the posterior distribution, it is possible to perform this marginalisation analytically, leading to 
a simple correction to the posterior for this "nuisance" parameter. This analytic correction avoids having to explicitly perform the integra- 
tion. Obviously the analytic form is dependent on the posterior shape, and it has previously been common to assume a Gaussian posterior 
JBridle et al.l2002i) . In Section|6| we provide an additional calculation using the true likelihood of an all-sky survey. Because of the offset 
nature of this distribution we find a different formula for analytically performing the marginalisation. 
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The analysis presented in this paper has higMighted the issues involved in a likelihood analysis of combined temperature and polarisation 
power spectra, and provides the first step on the way to providing a well-characterised method for the fast analysis of combined temperature 
and polarisation data from future experiments. In subsequent papers we intend to build on this work by considering the practical application 
of these techniques an d the possible modifications needed for analysing upcoming joint temperature and polarisation experiments such as 
the Planck experiment iTauben2004) . With the precision with which future experiments will measure the CMB temperature and polarisation 
fields, the likelihood techniques presented in this paper will become increasingly important for accurately constraining cosmological models 
from these data. 
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